Policy-based optimization: single-step policy gradient method seen as an evolution strategy

نویسندگان

چکیده

This research reports on the recent development of black-box optimization methods based single-step deep reinforcement learning and their conceptual similarity to evolution strategy (ES) techniques. It formally introduces policy-based (PBO), a policy-gradient-based algorithm that relies policy network describe density function its forthcoming evaluations, uses covariance estimation steer improvement process in right direction. The specifics PBO are detailed, connections evolutionary strategies discussed. Relevance is assessed by benchmarking against classical ES techniques analytic functions minimization problems, optimizing various parametric control laws intended for Lorenz attractor cartpole problem. Given scarce existing literature topic, this contribution definitely establishes as valid, versatile technique, opens way multiple future improvements building inherent flexibility neural networks approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Inference-based Policy Gradient Method

In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it...

متن کامل

An Inference-based Policy Gradient Method

متن کامل

Adaptive Step-Size for Policy Gradient Methods

In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement–learning field. In particular, they have been largely employed in motor control and robotic applications, thanks to their ability to cope with continuous state and action domains and partial observable problems. Policy gradient researches have been mainly focused on the identification of effe...

متن کامل

Policy Gradient Method for Team Markov Games

The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team problems as Markov games endowed with the asymmetric equilibrium concept and based on this formulation, we provide a direct policy gradient learning method. In addition, we test the proposed method with a small example p...

متن کامل

THE CMA EVOLUTION STRATEGY BASED SIZE OPTIMIZATION OF TRUSS STRUCTURES

Evolution Strategies (ES) are a class of Evolutionary Algorithms based on Gaussian mutation and deterministic selection. Gaussian mutation captures pair-wise dependencies between the variables through a covariance matrix. Covariance Matrix Adaptation (CMA) is a method to update this covariance matrix. In this paper, the CMA-ES, which has found many applications in solving continuous optimizatio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Computing and Applications

سال: 2022

ISSN: ['0941-0643', '1433-3058']

DOI: https://doi.org/10.1007/s00521-022-07779-0